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Abstract 

A method is given that "inverts" a logic gram- 
mar and displays it from the point of view of the 
logical form, rather than from that of the word 
string. LR-compiling techniques are used to al- 
low a recursive-descent generation algorithm to 
perform "functor merging" much in the same 
way as an LR parser performs prefix merging. 

I/"") This is an improvement on the semantic-head- 
driven generator that results in a much smaller 

ON search space. The amount of semantic look- 
ahead can be varied, and appropriate tradeoff 
points between table size and resulting nonde- 

, ^~ terminism can be found automatically. 



CN1 Introduction 

With the emergence of fast algorithms and optimiza- 
C^tion techniques for syntactic analysis, such as the use of 
^^explanation-based learning in conjunction with LR par- 



IS 



'sing, see [Samuelsson & Rayner 1991] and subsequent 
work, surface generation has become a major bottleneck 
^Tin NLP systems. Surface generation is the inverse pro- 
Q^blem of syntactic analysis and subsequent semantic in- 
* — .terpretation. The latter consists in constructing some 
^Q emantic representation of an input word-string based 
I on the syntactic and semantic rules of a formal gram- 
Osrar. In this article, we will limit ourselves to logic 
Hgrammars that attribute word strings with expressions 
Oin some logical formalism represented as terms with a 
functor-argument structure. The surface generation pro- 
blem then consists in assigning an output word-string to 
such a term. In general, both these mappings are many- 
to-many: A word string that can be mapped to several 
distinct logical forms is said to be ambiguous. A logi- 
cal form that can be assigned to several different word 
strings is said to have multiple paraphrases. 

We want to create a generation algorithm that gene- 
rates a word string by recursively descending through a 
logical form, while delaying the choice of grammar rules 
to apply as long as possible. This means that we want 
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to process different rules or rule combinations that intro- 
duce the same piece of semantics in parallel until they 
branch apart. This will reduce the amount of spurious 
search, since we will gain more information about the 
rest of the logical form before having to commit to a 
particular grammar rule. 

In practice, this means that we want to perform "func- 
tor merging" much in the same ways as an LR parser per- 
forms prefix merging by employing parsing tables compi- 
led from the grammar. One obvious way of doing this is 
to use LR-compilation techniques to compile generation 
tables. This will however require that we reformulate 
the grammar from the point of view of the logical form, 
rather than from that of the word string from which it 
is normally displayed. 

This gives us the following working plan: We will first 
review basic LR compilation of parsing tables in Sec- 
tion 2. The grammar-inversion procedure turns out to 
be most easily explained in terms of the semantic-head- 
driven generation (SHDG) algorithm. We will therefore 
proceed to outline the SHDG algorithm in Section 3. 
The grammar inversion itself is described in Section 4, 
while LR compilation of generation tables is discussed 
in Section 5. The generation algorithm is presented in 
Section 6 together with techniques for optimizing the ge- 
neration tables. Section 7, finally, discusses the findings. 

2 LR Compilation for Parsing 

LR compilation in general is well-described in for exam- 
ple [Aho et al. 1986], pp. 215-247. Here we will only 
sketch out the main ideas. 

An LR parser is basically a pushdown automaton, i.e., 
it has a pushdown stack in addition to a finite set of in- 
ternal states and a reader head for scanning the input 
string from left to right one symbol at a time. The stack 
is used in a characteristic way: The items on the stack 
consist of alternating grammar symbols and states. The 
current state is simply the state on top of the stack. The 
most distinguishing feature of an LR parser is however 
the form of the transition relation — the action and goto 
tables. A nondeterministic LR parser can in each step 
perform one of four basic actions. In state S with look- 
ahead symbol 1 Sym it can: 



J The lookahead symbol is the next symbol in the input 
string, i.e., the symbol under the reader head. 



s 
s 

VP 
VP 
VP 
VP 

pp 



S QM 
NP VP 
VP PP 
VP AdvP 
V 

V t NP 
P NP 



NP 

NP 

NP 

V 

V t 

P 

AdvP 
QM 



John 

Mary 

Paris 

sleeps 

sees 

in 

today 



Figure 1: Sample grammar 

1. accept: Halt and signal success. 

2. error: Fail and backtrack. 

3. shift S2: Consume the input symbol Sym, push it 
onto the stack, and transit to state S2 by pushing 
it onto the stack. 

4. reduce R: Pop off two items from the stack for 
each phrase in the RHS of grammar rule R, inspect 
the stack for the old state S\ now on top of the 
stack, push the LHS of rule R onto the stack, and 
transit to state S2 determined by goto(S\,LHS,S2) 
by pushing S2 onto the stack. 

Consider the small sample grammar given in Figure 1. 
To make this simple grammar more interesting, the ad- 
dition of a question mark {QM) to the end of a sentence 
(S), as in John sleeps?, is interpreted as a yes-no que- 
stion version of S by the recursive Rule 1, S —> S QM . 

Each internal state consists of a set of dotted items. 
Each item in turn corresponds to a grammar rule. The 
current string position is indicated by a dot. For exam- 
ple, Rule 2, S ^ NP VP, yields the item S => NP ■ VP, 
which corresponds to just having found an NP and now 
searching for a VP. 

In the compilation phase, new states are induced from 
old ones: For the indicated string position, a possible 
grammar symbol is selected and the dot is advanced one 
step in all items where this particular grammar sym- 
bol immediately follows the dot, and the resulting new 
items will constitute the kernel of the new state. Non- 
kernel items are added to these by selecting grammar ru- 
les whose LHS match grammar symbols at the new string 
position in the new items. In each non-kernel item, the 
dot is at the beginning of the rule. If a set of items is 
constructed that already exists, then this search branch 
is abandoned and the recursion terminates. 

The state-construction phase starts off by creating an 
initial set consisting of a single dummy kernel item and 
its non-kernel closure. This is State 1 in Figure 2. The 
dummy item introduces a dummy top grammar symbol 
as its LHS, while the RHS consists of the old top symbol, 
and the dot is at the beginning of the rule. In the exam- 
ple, this is the item 5" =>• • S. The rest of the states are 
induced from the initial state. The states resulting from 
the sample grammar of Figure 1 are shown in Figure 2. 

In conjunction with grammar formalisms employing 
complex feature structures, this procedure is associated 
with a number of interesting problems, many of which 
are discussed in [Nakazawa 1991] and [Samuelsson 1994]. 
For example, the termination criterion must be modified: 





State 5 


State 1 


VP =>• VP PP • 


S' =>• • S 




S =>• ■ S QM 


State 6 


S =>• ■ NP VP 


VP =>• VP AdvP 


State 2 


State 7 


S' =>• S ■ 


PP =>• P • NP 


S =>• S ■ QM 






State 8 


State 3 


PP =>• P NP ■ 


S =>• NP • VP 




VP ^ ■ VP PP 


State 9 


VP ^ ■ VP AdvP 


VP =>• V, ■ 


VP =>• • v 




VP =>• • V t NP 


State 10 




VP =>• V t ■ NP 


State 4 




S =>• NP VP ■ 


State 11 


VP ^ VP ■ PP 


VP =>• V t NP ■ 


VP ^ VP ■ AdvP 




PP =>• ■ P NP 


State 12 




S =>• S QM • 



Figure 2: LR-parsing states for the sample grammar 



If a new set of items is constructed that is more specific 
than an existing one, then this search branch is aban- 
doned and the recursion terminates. If it on the other 
hand is more general, then it replaces the old one. 

3 Semantic-Head-Driven Generation 

Generators found in large-scale systems such as the 
DFKI DISCO system, [Uszkoreit et al. 1994], or the SRI 
Core Language Engine, [Alshawi (ed.) 1992], pp. 268- 
275, tend typically to be based on the semantic-head- 
driven generation (SHDG) algorithm. The SHDG algo- 
rithm is well-described in [Shieber et al. 1990]; here we 
will only outline the main features. 

The grammar rules of Figure 1 have been attributed 
with logical forms as shown in Figure 3. The nota- 
tion has been changed so that each constituent con- 
sists of a quadruple [Cat, S em, Wo, W\) , where Wo and 
W\ form a difference list representing the word string 
that Cat spans, and Sem is the logical form. For exam- 
ple, the logical form corresponding to the LHS S of the 
(S,mod(l,Y),W ,W) -> (S,X,Wo,Wi) (QM^W^W) 
rule, consists of a modifier Y added to the logical form 
X of the RHS S. As we can see from the last grammar 
rule, this modifier is in turn realized as ynq. 

For the SHDG algorithm, the grammar is divided into 
chain rules and non-chain rules: Chain rules have a di- 
stinguished RHS constituent, the semantic head, that 
has the same logical form as the LHS constituent, mo- 
dulo A-abstractions; non-chain rules lack such a consti- 
tuent. In particular, lexicon entries are non-chain rules, 
since they do not have any RHS constituents at all. This 
distinction is made since the generation algorithm treats 
the two rule types quite differently. In the example gram- 
mar, rules 2 and 5 through 7 are chain rules, while the 
remaining ones are non-chain rules. 



(5,mod(X,Y), W , W) -+ 1 

{S,l, Wo, Wi) (QAf,Y,Wi,W) 
(S,1,W ,W) -+ 2 

(NP,1, Wo, Wi) {VP,1~Y, Wi,W) 
(yP,X-mod(Y,Z), Wo, W) -+ 3 

(VP,1~Y, Wo, Wi) (AdvP^^^W) 
(yP,X-mod(Y,Z), Wo, W) -+ 4 

(VP,X~Y, Wo, Wi) (PP,Z,Wi,W) 
(VP,1, Wo, W) -+ 5 

(V u l,Wo,W) 
(VP,1,W ,W) 6 

(V t ,l~Y, Wo, Wi) (NP^W^W) 
(PP,1,W ,W) -+ 7 

(P,X~Y, Wo,W{) {NP,1, Wi, W) 
(NP,john,[John\W],W) John 
{NP,mary,[Mary\W],W} -> Mary 
(NP,paris,[Paris\W],W) -> Paris 
{Vi,X~sleeTp(X),[sleeps\W],W} -> sleeps 
{V t ,X~Y~see(X,Y),[see\W],W} -> sees 
(P,X~in(X),[«n|W],W) m 
(AffoP, today, [today\W],W] -> foday 
(QAf,ynq,[?|W],W) ? 

Figure 3: Sample grammar with semantics 



Functor-introducing rules 

{S,mod(X,Y),W ,W,e,e} 1 
(S,X,Wo,Wi,e,e) (QAf , Y, Wi , W, e, e) 

( VP, X"mod(Y, Z), Wo, W, A , A) 3 
( VP, X~Y, W , Wi , Ao,A) (AdvP, Z, Wi , W, e, e) 

( VP, X"mod(Y, Z), Wo, W, A), 4 
( VP, X~Y, Wo,W t , Ao,A) (PP, Z, Wi , W, e, e) 

(iVP, john, [John\W],W,A,e] A 

(NP,mary,[Mary\W],W,A,e) A 

{NP,paris,[Pans\W],W,A,e} A 

( V 8 , X~sleep(X), [sleeps| W], W, A, e) -> A 

(V t ,X~Y~see(X,Y),[see|W], W,yV) A 
(P,X~i.xi(t),[m\W],W,A,t) -+ A 
{AdvP, today, [today\W],W, A, e) -> A 
(<5M,ynq, [?|W],W,yV) A 

Argument-filling rules 

{S, Y,W , W,e,e) -+ 2 

( VP, X~Y, Wi , W, [{NP, X, Wo , Wi}], e) 
( VP, X, Wo ,W,A , A) 5 

(V u l,Wo,W,Ao,A) 
( VP, Y, Wo ,W,A , A) 6 

( V t , X~Y, W ,Wi, [{NP, X, Wi , W)\Ao],A) 
(PP,Y,Wo,W,A ,A) -+ 7 

(P, X~Y, Wo, W!,[{NP, X, Wi, W)| A>], -4} 



A simple semantic-head-driven generator might work 
as follows: Given a grammar symbol and a piece of lo- 
gical form, the generator looks for a non-chain rule with 
the given semantics. The constituents of the RHS of 
that rule are then generated recursively, after which the 
LHS is connected to the given grammar symbol using 
chain rules. At each application of a chain rule, the rest 
of the RHS constituents, i.e., the non-head constituents, 
are generated recursively. The particular combination 
of connecting chain rules used is often referred to as a 
chain. The generator starts off with the top symbol of 
the grammar and the logical form corresponding to the 
string that is to be generated. 

The inherent problem with the SHDG algorithm is 
that each rule combination is tried in turn, while the 
possibilities of prefiltering are rather limited, leading to 
a large amount of spurious search. The generation al- 
gorithm presented in the current article does not suffer 
from this problem; what the new algorithm in effect does 
is to process all chains from a particular set of grammar 
symbols down to some particular piece of logical form in 
parallel before any rule is applied, rather than to con- 
struct and try each one separately in turn. 

4 Grammar Inversion 

Before we can invert the grammar, we must put it in 
normal form. We will use a variant of chain and non- 
chain rules, namely functor-introducing rules correspon- 
ding to non-chain rules, and argument-filling rules corre- 
sponding to chain rules. The inversion step is based on 
the assumption that there are no other types of rules. 

Since the generator will work by recursive descent 
through the logical form, we wish to rearrange the gram- 
mar so that arguments are generated together with their 
functors. To this end we introduce another difference 
list Aq and A to pass down the arguments introduced 



Figure 4: Sample grammar in normal form 



by argument-filling rules to the corresponding functor- 
introducing rules. Here the latter rules are assumed to 
be lexical, following the tradition in GPSG where the 
presence of the SUBCAT feature implies a preterminal 
grammar symbol (see [Gazdar et al. 1985], p. 33), but 
this is really immaterial for the algorithm. 

The grammar of Figure 3 is shown in normal form 
in Figure 4. The grammar is compiled into this form 
by inspecting the flow of arguments through the logical 
forms of the constituents of each rule. In the functor- 
introducing rules, the RHS is rearranged to mirror the 
argument order of the LHS logical form. The argument- 
filling rules have only one RHS constituent — the seman- 
tic head — and the rest of the original RHS constituents 
are added to the argument list of the head constituent. 
Note, for example, how the NP is added to the argument 
list of the VP in Rule 2, or to the argument list of the 
P in Rule 7. This is done automatically, although cur- 
rently, the exact flow of arguments is specified manually. 

We assume that there are no purely argument-filling 
cycles. For rules that actually fill in arguments, this 
is obviously impossible, since the number of arguments 
decreases strictly. For the slightly degenerate case of 
argument-filling rules which only pass along the logical 
form, such as the ( VP , X) — ► (Vi, X) rule, this is equiva- 
lent to the off-line parsability requirement, see [Kaplan 
& Bresnan 1982], pp. 264-266. 2 We require this in order 
to avoid an infinite number of chains, since each possible 
chain will be expanded out in the inversion step. Since 
subcategorization lists of verbs are bounded in length, 
PATR II style VP rules do not pose a serious problem, 



2 If the RHS V, were a VP, we would have a purely 
argument-filling cycle of length 1. 



(S, mod(X,Y), W ,W,e,e) 

(S,X,Wo, Wi,e,e) ( QM, Y, PVi , W, e, e) 
(S,X~mod(Y,Z),W/o,PV, 

( VP, X~Y, PVi , W 2 , [{NP, X, Wo, Wi)], e) 

(AdvP,Z, W 2 ,W,e,e) 
(S,X~mod(Y,Z),W/o,PV, 

( VP, X~Y, PVi , W 2 , [(iVP, X, Wo, Wi}], e) 

(PP,Z,W 2 ,W,c,c) 
{ VP, X~mod(Y, Z), Wi, W, [{NP, X, W , Wi}], e) 

( VP, X~Y, Wi , W 2 , [(iVP, X, Wo, Wi}], c) 

(AdvP,Z,W 2 ,W,e,e) 
( VP, X~mod(Y, Z), Wi, W, [(NP, X, W , Wi}], c) 

{ VP, X~Y, Wi , W 2 , [{NP, X, W , Wi}], e) 

(PP,Z,W 2 ,W,c,c) 
(S,sleep(X), W , W,e,e) -+ (NP ,l,W ,[sleeps\W],e,e) 
( VP, X-sleep(X), [sleeps\W], W, [(NP, X, W , [sleeps\W])], c) 

(NP,l,W ,[sleeps\W],e,e) 
(S,see(X,Y), W , W,c,c) 

(NP, X, Wi , W, e, e) (iVP, Y, W , [sees| Wi], e, e) 
( VP, Y-see(X, Y), [sees| W ], W, [(NP, Y, Wi , [sees| W ]}], e) 

(iVP,X,W , W,e,e) (NP J ,Wi ,[sees\W ],e, e) 
(PP,X~in(X),[«n|W ], W,e,e} -+ (NP, X, W , W, e, c) 
{NP,john,[John\W],W,e,e) e 
(iVP, mary, [Mary|PV], PV, e, e) e 
(iVP, paris, [Pans|PV], PV, e, e) e 
(Arf^P, today, [today\W], W, e, c) -> e 
(<5M,ynq, [?|PV],PV, ^ e 

Figure 5: Inverted sample grammar 



which on the other hand the "adjunct-as-argument" ap- 
proach taken in [Bouma & van Noord 1994] may do. 
However, this problem is common to a number of other 
generation algorithms, including the SHDG algorithm. 

Let us return to the scenario for the SHDG algorithm 
given at the end of Section 3: We have a piece of logical 
form and a grammar symbol, and we wish to connect 
a non-chain rule with this particular logical form to the 
given grammar symbol through a chain. We will gene- 
ralize this scenario just slightly to the case where a set 
of grammar symbols is given, rather than a single one. 

Each inverted rule will correspond to a particular 
chain of argument-filling (chain) rules connecting a 
functor-introducing (non-chain) rule introducing this lo- 
gical form to a grammar symbol in the given set. The 
arguments introduced by this chain will be collected and 
passed down to the functors that consume them in or- 
der to ensure that each of the inverted rules has a RHS 
matching the structure of the LHS logical form. The nor- 
malized sample grammar of Figure 4 will result in the 
inverted grammar of Figure 5. Note how the right-hand 
sides reflect the argument structure of the left-hand-side 
logical forms. As mentioned previously, the collected ar- 
guments are currently assumed to correspond to functors 
introduced by lexical entries, but the procedure can rea- 
dily be modified to accommodate grammar rules with a 
non-empty RHS, where some of the arguments are con- 
sumed by the LHS logical form. 

The grammar inversion step is combined with the LR- 
compilation step. This is convenient for several rea- 
sons: Firstly, the termination criteria and the database 
maintenance issues are the same in both steps. Secondly, 



State 1 

(S",f(X),e,e) -(S,X,e,e) 
State 2 

(5,mod(X,Y),e, e) ■ (S, X, e, e) (QM, Y, e, e) 

(5,mod(Y,Z),e,e) => ■ ( VP, X~Y, [{NP, X}], e) (AdvP, Z, e, e) 
(5,mod(Y,Z),e,e) ■ ( VP, X~Y, [{NP, X}], e) (PP, Z, e, c) 

State 3 

(5,mod(X,Y),e, e) ■ (S, X, e, e) (QM, Y, e, e) 

(5,mod(Y,Z),e,e) ■ ( VP, X~Y, [{NP, X}], e) (AdvP, Z, e, c) 

(5,mod(Y,Z),e,e) ■ { VP, X~Y, [{NP, X}], e) (PP, Z, e, e) 

(VP,X~mod(Y,Z),[(iVP,X}],e} 

• ( VP, X~Y, [{NP, X}], e) {AdvP, Z, e, e) 
(VP,X~mod(Y,Z),[(iVP,X}],e} 

• ( VP, X~Y, [{NP, X}], e) (PP, Z, e, e) 

Figure 6: The first three generation states 

since the LR-compilation step employs a top-down rule- 
invocation scheme, this will ensure that the arguments 
are passed down to the corresponding functors. In fact, 
invoking inverted grammar rules merely requires first in- 
voking a chain of argument-filling rules and then termi- 
nating it with a functor-introducing rule. 

5 LR Compilation for Generation 

Just as when compiling LR-parsing tables, the compiler 
operates on sets of dotted items. Each item consists of 
a partially processed inverted grammar rule, with a dot 
marking the current position. Here the current position 
is an argument position of the LHS logical form, rather 
than some position in the input string. 

New states are induced from old ones: For the indica- 
ted argument position, a possible logical form is selected 
and the dot is advanced one step in all items where this 
particular logical form can occur in the current argument 
position, and the resulting new items constitute a new 
state. All possible grammar symbols that can occur in 
the old argument position and that can have this logi- 
cal form are then collected. From these, all rules with a 
matching LHS are invoked from the inverted grammar. 
Each such rule will give rise to a new item where the dot 
marks the first argument position, and the set of these 
new items will constitute another new state. If a new 
set of items is constructed that is more specific than an 
existing one, then this search branch is abandoned and 
the recursion terminates. If it on the other hand is more 
general, then it replaces the old one. 

The state-construction phase starts off by creating 
an initial set consisting of a single dummy item with 
a dummy top grammar symbol and a dummy top logi- 
cal form, corresponding to a dummy inverted grammar 
rule. In the sample grammar, this would be the rule 
(S",f(X), W , W,e,e) (S, X, W , W, e, e). The dot is 

at the beginning of the rule, selecting the first and only 
argument. The rest of the states are induced from this 
one. The first three states resulting from the inverted 
grammar of Figure 5 are shown in Figure 6, where the 
difference lists representing the word strings are omitted. 

The sets of items are used to compile the generation 



tables m tne same way as is done lor Lit parsing, ine 
goto entries correspond to transiting from one argument 
of a term to the next, and thus advancing the dot one 
step. The reductions correspond to applying the rules of 
items that have the dot at the end of the RHS, as is the 
case when LR parsing. There is no obvious analogy to 
the shift action — the closest thing would be the descend 
actions transiting from a functor to one of its arguments. 

Note that there is no need to include the logical form 
of each lexicon entry in the generation tables. Instead, a 
typing of the logical forms can be introduced, and a re- 
presentative of each type used in the actual tables, rather 
than the individual logical forms. This decreases the size 
of the tables drastically. For example, there is no point 
in distinguishing the states reached by traversing john, 
mary and paris, apart from ensuring that the correct 
word is added to the output word-string. This is accom- 
plished much in the same way as preterminals, rather 
than individual words, figure in LR-parsing tables. 

6 A New Generation Algorithm 

The generator works by recursive descent through the 
logical form while transiting between internal states. It 
is driven by the descend, goto and reduce tables. A pu- 
shdown stack is used to store intermediate constituents. 

When generating a word string, the current state and 
logical form determine a transition to a new state, cor- 
responding to the first argument of the logical form, 
through the descend table. A substring is generated re- 
cursively from the argument logical form, and this con- 
stituent is pushed onto the stack. The argument logical 
form, together with the new current state, determine a 
transition to the next state through the goto table. The 
next state corresponds to the next argument of the ori- 
ginal logical form, and another substring is generated 
from this argument logical form, etc. When no more 
arguments remain, an inverted grammar rule is selected 
nondeterministically by the reduce table and applied to 
the top portion of the stack, constructing a word string 
corresponding to the original logical form and comple- 
ting this generation cycle. 3 

The logical form can be inspected down to an arbitrary 
depth of recursion when compiling the sets of items, and 
this parameter can be varied. This is closely related to 
the use of lookahead symbols in an LR parser; increasing 
the depth is analogous to increasing the number of look- 
ahead symbols. The amount of semantic lookahead is 
reflected in the goto and descend entries. The key para- 
meter influencing the generation speed is the amount of 
nondeterminism in each "reductive state", i.e., each state 
where the dot is at the end of some rule. Increased se- 
mantic lookahead will split potential nondeterminism in 
the resulting reductive states into distinct sets of items, 
yielding reductive states with less nondeterminism. 

No semantic lookahead would mean only taking the 
functor of the logical form into consideration, and in the 

3 This is a bottom-up rule invocation scheme. It could 
easily be modified so that a rule is instead applied before 
constructing the substrings recursively, resulting in a top- 
down rule-invocation scheme, which might be a good idea in 
conjunction with semantic lookahead, see the following. 



descendd, mod(mod(_ ,_) ,ynq) , 2A) . 
descendd, mod(see (_ ,_) ,ynq) , 2B) . 
descendd, mod(sleep(_) ,ynq) , 2C) . 

State 2A 

(S, mod(mod(X, Y), ynq), e, e) =>• 

• (5,mod(X, Y), e,e) ( QM , ynq, e, t) 

State 2B 

(S, mod(see(X, Y), ynq), e, e) =>• 

• (5,see(X,Y), e,e) (QM , ynq, e, e) 

State 2C 

(S, mod(sleep(X),ynq), e, e) =>• 

• (5,sleep(X), e, e) (QM , ynq, e, e) 

Figure 7: Alternative generation states 

descendd, mod(_,ynq), 2). 

State 2 

(S, mod(X, ynq), e,e) =>• • (S , X, e, e) ( QM, ynq, e, e) 
Figure 8: Alternative alternative generation states 

running example, a typical action table entry would be 
descend(l,mod(_,_) ,2). 4 This would mean that the 
generator would operate on State 2 of Figure 6 when ge- 
nerating from the first argument of the mod(_,_) term, 
and both the S alternative and the (merged) VP alter- 
native^) would be attempted nondeterministically. 

By taking the arguments of the logical form into ac- 
count, the degree of nondeterminism can be reduced, and 
for the sample grammar used throughout this article, it 
is eliminated completely. In the example, if the second 
argument of the mod(_,_) term is ynq, then only the 
S alternative will be considered when generating from 
the first argument, since the relevant states and descend 
entries will be those of Figure 7. 

The optimal depth may vary for each individual table 
entry, and even within it, and a scheme has been devised 
to automatically find such an optimum by inspecting the 
number of items left in each reductive state. The scheme 
employs a greedy algorithm with iterative deepening to 
this end. In the running example, the first argument 
of mod(_,_) contributes no important information when 
descending from State 1, while the second one does. The 
scheme correctly finds the optimal depths when transi- 
ting from State 1, resulting in the State 2 and descend 
entry of Figure 8. This is described in detail elsewhere. 

7 Summary and Discussion 

The proposed algorithm is an improvement on the 
semantic-head-driven generation algorithm that allows 
"functor merging" , i.e., enables processing various gram- 
mar rules, or rule combinations, that introduce the same 
semantic structure simultaneously, thereby greatly redu- 
cing the search space. The algorithm proceeds by re- 
cursive descent through the logical form, and using the 

4 Here "_" denotes a don't-care variable. 
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gorithm in effect does is to process all chains from a 
particular set of grammar symbols down to some parti- 
cular piece of logical form in parallel until a reduction 
is attempted, rather than to construct and try each one 
separately in turn. This requires a grammar-inversion 
technique that is fundamentally different from techni- 
ques such as the essential-argument algorithm, see the 
following, since it must display the grammar from the 
point of view of the logical form, rather than from that of 
the word string. LR-compilation techniques accomplish 
the functor merging by compiling the inverted grammar 
into a set of generation tables. 

The set of applicable reductions can be reduced by 
using more semantic lookahead, at the price of a larger 
number of internal states, and there is in general a trade- 
off between the size of the resulting generation tables and 
the amount of nondeterminism when reducing. The em- 
ployed amount of semantic lookahead can be varied, and 
a scheme has been devised and tested that automatically 
determines appropriate tradeoff points, optionally based 
on a collection of training examples. 

The grammar inversion rearranges the grammar as a 
whole according to the functor-argument structure of 
the logical forms. Other inversion schemes, such as the 
essential-argument algorithm, see [Strzalkowski 1990] or 
the direct-inversion approach, see [Minnen et al. Forth- 
coming], are mainly concerned with locally rearranging 
the order of the RHS constituents of individual gram- 
mar rules by examining the flow of information through 
these constituents, to ensure termination and increase 
efficiency. Although this can occasionally change the set 
of RHS symbols in a rule, it is done to these ends, rather 
than to reflect the functor-argument structure. 

Some hand editing is necessary when preparing the 
grammar for the inversion step, but it is limited to spe- 
cifying the flow of arguments in the grammar rules. Fur- 
thermore, this could potentially be fully automated. 

Although the sample grammar used throughout the 
article is essentially context-free, there is nothing in 
principle that restricts the method to such grammars. 
In fact, the method could be extended to grammars em- 
ploying complex feature structures as easily as the LR- 
parsing scheme itself, see for example [Nakazawa 1991], 
and this is currently being done. 

The method has been implemented and applied to 
much more complex grammars than the simple one used 
as an example in this article, and it works excellently. 
Although these grammars are still too naive to form the 
basis of a serious empirical evaluation lending substantial 
experimental support to the method as a whole, it should 
be obvious from the algorithm itself that the reduction in 
search space compared to the SHDG algorithm is most 
substantial. Nonetheless, such an evaluation is a top- 
priority item on the future-work agenda. 
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